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Abstract 


Robust  and  consistent  estimation  of  the  location  parameter 
of  an  asymmetric  distribution  and  general,  non-location  and 
scale  parameter  estimation  problems  have  been  vexing  problems 
in  the  history  of  robustness  studies.  The  minimum  distance  (MD) 
estimation  method  is  shown  to  provide  a  heurlstically  reasonable 
mode  of  attack  for  these  problems  which  also  leads  to  excellent 
robustness  properties.  Both  asymptotic  and  Monte  Carlo  results 
for  the  familiar  case  of  estimation  of  the  location  parameter 
of  a  symmetric  distribution  support  this  proposition,  showing 
MD-estimators  to  be  competitive  with  some  of  the  better 
estimators  thus  far  proposed. 

Key  Words:  Robust  estimation;  Minimum  distance;  Non-location 
and  scale  problems;  Influence  curve;  Swindle. 
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1.  INTRODUCTION  AND  HISTORY 


A  major  concern  In  much  recent  statistical  literature  has 
been  robust  estimation,  i.e  efficient  or  nearly  efficient 
(at  a  model)  estimation  procedures  which  also  perform  well 
under  moderate  deviations  from  that  model.  Huber  (1964)  has 
proposed  a  class  of  M-estlmators  as  solutions  to  a  formally 
stated  minimax  problem  of  this  type.  However,  as  stated  in 
Huber  (1972,  Sec.  12.3)  and  Huber  (1977),  problems  occur  when 
the  attempt  is  made  to  extend  these  methods  (highly  successful 
when  invariance  and  symmetry  properties  are  present)  to  shape 
or  truncation  parameter  models.  Thus,  there  is  a  need  for 
procedures  which  extend  easily  to  the  more  difficult  situations. 

Wolfowitz  (1957)  published  a  fundamental  paper  outlining 
the  minimum  distance  method,  proving  a  consistency  result,  and 
giving  a  number  of  intriguing  examples  of  its  use.  Interestingly, 
the  motivation  for  his  work  was  the  existence  of  complex  esti¬ 
mation  problems,  then  unsolved  via  other  methods.  Knusel  (1969) 
examined  the  relationship  of  robustness  considerations  to  the 
method  of  minimum  distance  (henceforth  called  the  MD-method) . 

For  the  particular  discrepancy  function  studied  most  closely 
(which  apparently  requires  numerical  integration  for  its  evalu¬ 
ation)  he  showed  that  his  D-estimators  belong  to  the  class  of 
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M-estimators.  Littell  and  Rao  (1975)  and  Rao,  Schuster,  and 
Littell  (1975)  have  considered  In  some  detail  the  use  of  the 
Kolmogorov  distance  for  MD-estimation,  emphasizing  the  t vo-sample 
shift  problem  but  also  addressing  and  obtaining  results  for  the 
one  sample  location  case.  Holm  (in  the  discussion  of  Bickel 
(1976))  has  suggested  MD-estlmatlon  as  being  the  most  natural 
method  for  some  robustness  problems,  and  a  recent  paper  by 
Easterling  (1976)  approaches  MD-estimation  from  the  point  of 
view  of  consonance  regions  in  order  to  incorporate 
goodness-of-fit  considerations  directly  into  the  problem  of 
parameter  estimation. 


2.  NOTATION  AND  DEFINITIONS 

Several  measures  of  the  discrepancy  between  an  empirical 
distribution  function  and  a  theoretical  one  are  of  special 
interest  in  this  work.  In  the  following  we  let  G^(‘ )  denote  the 
empirical  distribution  function  based  upon  a  random  sample  of 
size  n  from  the  (possibly  unknown)  true  distribution  function 
G(«),  and  r  *  (F  (•),  0  e  0)  where  0  is  some  parameter  space. 

Most  of  the  discrepancies  considered  here  are  in  use  as  goodness- 
of-fit  statistics  based  upon  the  empirical  distribution  function 
(for  surveys  see  Stephens  (1974)  ,  Sahler  (1968)).  Let  K  and 
L  denote  two  distribution  functions  with  support  a  (common) 


subset  of  R.  A  list  of  same  measures  of  interest  follows: 

i)  D  (K,L)  ■  sup  |K(x)  -  L(x)  |<t(L(x)) ,  the  weighted 
*'  xcR 

Kolmogorov  distance,  with  the  uniform  weighting  function 
i{>(*)  =  1  of  special  interest. 

ii)  W  2(K,L)  -  /  (K(x)  -  L(x))2i|»(L(x))dL(x) ,  the  weighted 
^  —00 

Cramer-von  Mises  distance  with  the  special  weight 
functions  of  interest 

a)  <KO  =  1  yielding  the  Cramer-von  Mises  statistic 
W2(K,L) 

b)  iji(u)  *  ^~u)  ’  0  <  u  <  1  yielding  the 

Anderson-Darling  statistic  A2(K,L),  and 

c)  <Ku)  ■  1  ,  e<u<l-e 

■  0  ,  otherwise 

for  some  e  with  0  <  t  <  j  yielding  a  trimmed 
.  Cramer-von  Mises  distance  as  suggested  by  Anderson 
and  Darling  (1952). 

iii)  V(K,L)  -  sup  j (K(b)  -  K(a))  -  (L(b)  -  L(a))|  , 
-<»<a<b<« 

Kuiper's  maximal  interval  probability  distance. 
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iv)  Z2  b(K,L)  -  a  /  (K(x)  -  L(x))2  dL(x) 

9  — m 

oo 

+  b[/  (K(x)  -  L(x))  dL(x)]2  , 

— 00 

a  class  of  discrepancies  including 

a _ b _ Discrepancy 

1  0  Cramer-von  Mises  W2(K,L) 

1  -1  Watson’s  U2(K,L) 

0  1  Chapman 

We  shall  use  6(K,L)  as  a  generic  symbol  for  any  such  dis¬ 
crepancy.  For  all  £(*,*)  to  be  considered,  6(K,L)  is  invariant 
under  1-1  transformations  of  the  parameter  space  and  monotone 
transformations  of  the  sample  space.  It  should  be  noted  that  of 
the  above,  the  weighted  sup-type  discrepancies  and  those  of 
integral  type  will  not  be  metrics  except  in  a  few  special  cases. 
Simple  computational  formulae  are  given  for  many  of  the  above 
(when  K(«)  is  an  empirical  distribution  function)  by  Stephens 
(1974). 

Loosely,  a  D-estimator  will  be  defined  as  a  value  8  e  ft 
such  that 


5<Gn’F0> 


inf  6(G  ,F  ) 
8efi  n  6 


(2.1) 


Suitable  precautions  will  of  course  have  to  be  taken  regarding 
attainment  of  the  infimum  in  fl.  It  may  well  be  inquired  as  to 


why  an  estimator  obtained  by  minimisation  of  a  discrepancy 
measure  which  is  useful  for  goodness-of-flt  purposes  (and,  hence, 
in  many  cases  extremely  sensitive  to  outliers  or  general  discre¬ 
pancies  from  the  model)  should  be  hoped  to  possess  any  desirable 
"robustness"  properties.  It  turns  out  that,  in  most  cases 
(although  not  for,  say.  A2)  while  the  discrepancy  measure  itself 
may  be  fairly  sensitive  to  the  presence  of  outliers,  the  value 
6  which  minimizes  the  discrepancy  fi(Gn>F0)  is  much  less  so. 

(Monte  Carlo  results  will  be  given  in  Section  A  to  support  the 
intuition.)  However,  if  the  invariance  restrictions  on  <$(•,*) 
are  relaxed,  x  may  be  obtained  as  the  D-estimator  corresponding 
to 

oo  oo 

6(Gn,F0)  -  (/x d(Gn  -  Fe)(x))2  -  (/  (Gq(x)  -  F0(x))dx)2, 

— <30  nQO 

where  r  ■  (F  ,  9  e  ft)  is  a  set  of  distributions  indexed  by  their 

V 

first  moments,  i.e.  E  IX]  ■  0.  Mote  that  5(*,*)  as  specified 
here  will  not  be  invariant  under  monotone  transformations  on 
the  sample  space. 

To  suggest  that  the  nature  of  MD-estimators  is  to  select 
in  r  a  best  approximation  to  G^,  we  shall  refer  to  T  as  being 
the  "correct  projection  family"  if  the  true  distribution  G  e  T, 
and  otherwise  as  the  "incorrect  projection  family".  Mote  that 
there  may  be  more  than  one  value  in  ft  for  which  the  infimum  in 
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(2.1)  is  attained,  and  that  there  is  no  guarantee  that  the 
infimum  will  be  attained  in  the  interior  of  ft.  Thus,  we  are 
forced  to  make  the  following  our  general  definition  of  a  sequence 
of  MD-estimators. 

Definition  2.1.  A  sequence  of  random  variables  (T  is  a 
sequence  of  asymptotic  minimum  distance  estimators  based  on 
with  respect  to  6(*,*)  and  r  if 

i)  T  e  ft  for  all  n  >  1 
n 

and 

ii)  There  exists  a  nonnegative  function  :-i(n)  with 
lim  K(n)  ■  0  such  that 

n-w» 

6  (G  ,Ft  )  <  inf  6(G  ,F.)  +  K(n)  for  all  n  >  1. 
n  n  "  eeft  n  6 

Similar  structure  has  been  used  in  this  setting  by  Uolfowitz 
(1957)  and  Sahler  (1970) .  The  following  consistency  theorem 
holds  by  a  straightforward  argument. 

Theorem  2.1.  With  all  notation  as  above,  if  {T  }  ,  is  a 

-  n  n«i 

OB 

sequence  of  asymptotic  MD-estimators  based  on  {G  with  respect 

to  6(«,')  and  the  model  T,  and  6,  G,  and  r  satisfy: 

00 

i)  for  any  sequence  {H  }  ,  , 

n  n*i 

sup  |Hn(x)  -  G(x)  |  0  implies  6(Hn,F0)  -*■  6(G,F0) 

uniformly  over  ft. 


ii)  there  is  a  point  ep  e  fl  such  that 

inf  6(G,F  )  -  6(G,F  )  , 

eefi  o 

iii)  lim  6 )  -  6(G,F  >  implies  lim  6k  ■  6q  , 

k-«°  °k  o  k-*» 

then  lim  T  ■  0o  with  probability  1. 

_ _  n-K*> 

Some  points  worthy  of  note  are  the  following: 

1)  This  result  is  simply  a  statement  of  sufficient  conditions 

for  continuity  of  the  functional  6  ■  T(G  )  (with  the  Kolmogorov 

n  n 

metric  on  the  space  of  distribution  functions)  at  G,  which  has 
been  considered  as  a  robustness  property  in  itself  (Bickel  and 
Lehmann  (1975a),  Fu  (1976),  Hampel  (1971),  with  use  of  the 
Prokhorov  norm) . 

2)  Sequences  {G^}®^  of  functions  other  than  empirical  distri¬ 
bution  functions  are  covered  by  the  proof  of  the  theorem.  This 
is  useful  for  a  differential-type  approach  to  the  demonstration 
of  asymptotic  normality. 

3)  The  major  condition  is  i) ,  requiring  uniformity  of  the  con¬ 
vergence  over  ft,  The  theorem  is  presented  in  this  fashion  as 
most  conducive  to  intuitive  insight.  The  condition  can  of  course 
be  easily  relaxed  to  local  uniformity  of  the  convergence.  The 
conditions  incidentally  cover  most  cases  of  location  parameter 
estimation  (scale  known)  and  are  easily  verified  (especially 


in  the  correct  projection  family  case).  Condition  ii)  merely 

specifies  the  uniqueness  of  the  value  0Q  e  8  minimizing  6(G,Fg), 

while  iii)  requires  that  the  parametrizatlon  of  T  (and  choice 

of  6(« , •))  be  sensible  -  that  in  order  to  get  fi(G,F@)  arbitrarily 

close  to  6(G,F.  ),  one  must  take  0  sufficiently  close  to  0  . 

®o 

A  similar  theorem  was  published  in  Wolfowitz  (1957)  for  a 
particular  choice  of  $(*,•)  . 

MD-estimators  share  an  invariance  property  with  maximum- 
likelihood  estimators  in  that  g(0)  -  g(0),  e.g.  that  an 

A 

MD-estimator  of  y2  for  a  N(y,o2)  population  is  thus  (y)2,  where 
y  denotes  an  MD-estimator  for  y.  Thus,  MD-estimation  is  invariant 
to  choice  of  the  function  g(0)  of  the  point  0  e  8  to  be 
estimated,  contrary  to  the  case  for  UMVU  estimation  methods. 

It  operates  in  a  manner  analogous  to  maximum  likelihood  methods 
in  simply  selecting  a  "best  approximating  distribution"  from 
those  in  the  model.  (See  Fisher  (1973,  p.146)  in  regard  to  the 
desirableness  of  this  property.) 

3.  LOCATION  PARAMETER  ESTIMATION 

3.1  Symmetric  Parent,  scale  known. 

In  this  section  we  let  T  be  a  translation  family  of 
symmetric  continuous  distribution,  i.e.  T  “  {Fg:Fg(x)  *  F(x  —  0), 
-co  <  Q  <  oo,  -oo  <  x  <  ®,  and  F(x)  ■  1  —  F(-x),  — •»  <  x  <  •»}  and 


assume  also  that  G,  the  sampled  distribution,  is  symmetric. 
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Also  let  Gn  denote  the  empirical  distribution  function  for  a 
random  sample  of  size  n  from  G. 

Although  influence  curves  as  in  Hampel  (1974)  can  be  easily 
derived  in  the  general  case,  we  give  explicit  solutions  only 
for  the  case  G  e  T,  taking  G  ■  ■  F  without  loss  of  generality. 

In  fact,  the  case  G  i  T  seems  to  possess  little  if  any  meaning 
or  significance  unless  scale  is  estimated  simultaneously. 
Influence  curves  for  minimum  D  and  V  estimators  are  not  obtain¬ 
able  by  our  methods  (and  may  well  not  exist) .  The  MD-estimators 
of  location  obtained  by  using  D  and  V  as  discrepancies  are  not 
even  asymptotically  normal  in  the  simplest  cases.  (Llttell 
and  Rao  (1975),  Bolthausen  (1977)  show  asymptotic  equivalence 
of  the  MD-estinrtor  based  on  D  to  a  complicated  (and  clearly 
nonnormal)  function  of  a  Brownian  bridge.)  We  consider  the  use 
of  discrepancies  of  the  form  Z2  ,  as  a  rather  general  class 
including  the  Cramer-von  Mises,  Watson's  U2,  and  Chapman 
discrepancies  as  special  cases.  The  usual  implicit  differen¬ 
tiation  yields  as  the  influence  curve  for  the  derived  estimator 

IC  ^(c)  «  -I a  J(F(x)  -  6  (x) )f 2(x)dx  +  b[  J(F(x)  -  6c(x))f (x)dx] 

l  -CD  C  — 

7  f2(x)dxj  [a  7  f3(x)  dx  +  b  |  7  f2(x)dx]  j 


<  c  <  »  , 


which  is  a  valid  expression  for  all  a  0,  a  +  b  0  with  one  inequality 
strict.  At  the  normal  parent,  G  ■  F  ■  *, 

-so  <  c  <  •  for  a  ■  1,  b  “  0 

(wrt  the  Cramer-von  Mises  discrepancy) 

\  -«•  <  c  <  •  for  a  -  1,  b  "  -1 
(wrt  the  Watson’s  U2  discrepancy). 

Note  in  Figure  3.1  that  the  minimum  U2  is  redescending  at  the 
normal  and  other  symmetric  models  as  long  as  G  t  T.  It  should 
be  mentioned  that  MD-estimation  of  location  parameters  using 
U2  as  a  discrepancy  measure  is  not  being  advocated  here,  but 
simply  being  used  as  an  analytically  simple  and  illustrative 
example.  The  fact  that  U2  is  more  powerful  (as  a  goodness-of- 
fit  test)  against  alternatives  involving  a  scale  shift  than 
against  location  shifts  (see  Stephens  (1974))  serves  as  an 
indicator  that  it  should  be  a  poor  choice  as  a  discrepancy 
measure  for  location  estimation,  but  a  good  one  for  scale 
estimation. 

(Figure  3.1  about  here) 

Table  3.1  contains  gross-error-sensitivities  and  asymptotic 
variances  at  the  normal  parent  for  these  two  estimators  and 
some  others  as  tabled  in  Hampel  (1974).  The  low  gross-error- 


XCTj(t(c)  -  /3¥  <♦(✓£>  -j), 


/Jit  (*{/2c)  -  *(c) 


1  -  /T 
2 


sensitivity  of  the  minimum  V2  estimator  (second  only  to  the 
median  among  those  tabled  by  Hampel)  Is  noteworthy,  as  is  the 
(expected)  high  variance  of  the  minimum  U2  estimator.  It  is 
somewhat  curious  that  projection  onto  the  normal  parent  via  a 
goodness-of-fit  distance  should  lead  to  estimators  with  any 
robustness  at  all.  The  basic  principle  seems  to  be  that 
robustness  is  due  to  measuring  the  discrepancy  between  observed 
data  and  model  in  "probability-type"  units.  In  cases  such  as 
the  Anderson-Darling  discrepancy,  where  the  weight  given  to 
deviations  in  probability  units  from  the  model  is  high  in  the 
tails  of  the  distribution,  drastic  sensitivity  to  incorrect 
tail-width  specification  can  be  expected.  Typical  measures  of 
interest,  as  exemplified  by  those  listed  In  Section  2  (excluding 
unboundedly  weighted  Kolmogorov  or  Cramer-von  Mises  discrepancies) 
assign  either  equal  or  less  weight  to  discrepancies  between  the 
model  and  the  data  in  regions  of  low  probability  content  for 
the  model.  In  fact,  the  Cramer-von  Mises  discrepancy  drastically 
downweights  discrepancies  in  the  tails.  The  "trimmed"  versions 
of  the  weighted  Cramer-von  Mises  discrepancy  are  in  fact 
designed  to  further  minimize  the  effect  of  extreme  observations. 
The  V  discrepancy,  designed  for  the  goodness-of-fit  problem  on 
the  circle,  weights  all  discrepancies  equally. 


(Table  3.1  about  here) 


TABLE  3.1 


Asymptotic  Variances  and  Gross-Error-Sensitivities 


Estimator 

0 

JL 

CVM  -  N 

1.095 

1.53 

U2  -  N 

1.869 

1.90 

M 

1.000 

• 

25A 

1.026 

1.86 

H(l. 5) 

1.037 

1.73 

50% 

1.571 

1.25 

10%  Trim 

1.060 

1.60 

H/L 

1.047 

1.77 

Entries  in  table: 

2 

o  ■  Asymptotic  variance 
Y*  *  Gross-error-sensitivity 


Estimators  are  a  minimum  Cram4r-von  Mises 

estimator  (CVM  -  N)  and  a  minimum  Watson's  v* 

,  2 

estimator  (U  -  N) ,  both  projecting  onto  the 
normal  location  family,  the  mean  (M) , 
median  (50%) ,  and  several  estimators  are 
tabled  in  Hampel  (1974) ,  Note  that  all  values 
in  the  table  are  at  the  N(0,1)  parent. 
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Asymptotic  normality  of  these  estimators  can  be  established 
by  using  techniques  similar  to  that  used  by  Boos  and  Serfling 
(1977)  for  M-estimators.  Briefly  put,  it  is  necessary  only  to 
show 

I T  [  Gj,  1  *  T[G]  -  H(Gn)DT[Gn  -  G]  |  -  o  ( 1 1  Gn  -  G\\j  ,  (3.1) 

00 

where  (G  }  -  is  a  sequence  such  that 

n  n»i 

I  lGn  "  Gl  L  “  8UP  lGn(x>  ”  G(x;'l  **■  0  » 

x 

T[*]  represents  the  estimator  under  consideration  as  a  functional 

on  (an  appropriate  subset  of)  the  space  of  univariate  distribution 

functions,  and  H(*)  a  functional  on  the  same  space  such  that 

lim  H(G  )  ■  1,  and  D_(G  -  G)  is  linear,  i.e.  there 

1 | G  -G | |  -0  n  T  n 

exists  a  function  i|>(*)  such  that 

00 

DT[Gn  "  G J  “  /  #<*>*<G  -  G)  (x) 

—00 

for  the  set  of  G  -  G  corresponding  to  the  above  collections  of 
n 

G^.  The  approach  of  Boos  and  Serfling  can  be  closely  paralleled 
for  the  most  part,  leading  to  the  following  theorem.  Note  that 
in  the  above,  iji(x)  *  IC_  _(x)  +  an  arbitrary  constant. 

T*u 
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Theorem  3.1  Let  T ,  G^,  G,  and  T  be  defined  as  above,  with  F 
and  G  symmetric.  If  T  is  a  MD-estiaator  with  respect  to  the 
discrepancy  Z2  ^  and  T  and 

i)  T[G  ]  eA 

n  o 

and  »  oo 

ii)  /  f3(x)dx  <  •,  /  |f'(x)|dx  <  • 

-®  «*00 

then 


lim  P [ »/rT(T [ G  J  -  T[G] )  <  z]  -  *(— )  , 
n~  n  °T 


where  o  2 
T 


/  IC^^g(x)dG(x)  >  0.  A  proof  is  sketched  in  the  Appendix* 


Some  points  for  comment  are  the  following: 

1)  Apart  from  the  conditions  for  consistency  (Theorem  2.1)  the 
burden  of  the  regularity  conditions  is  carried  by  the  projection 
family  T,  over  which  the  statistician  has  control,  rather  than 
the  unknown  distribution  function  G. 

2)  Equation  (3.1)  is  equivalent  in  most  cases  to 

00 

T[Gn]  -  T[G]  +  /  ICT  G<x)d(Gn  -  G) (x)  +  o(||Gn  -  G||j  , 

•00  * 

giving  the  asymptotic  equivalence  which  justifies  (for  the 
asymptotic  case)  the  usual  heuristic  interpretations  of  the 
influence  curve.  This  resultant  asymptotic  expansion  thus 
extends  the  normality  results  of  Sahler  (1970) . 


3)  Since  differentiability  implies  continuity  and  (3.1)  is 
equivalent  to 

00 

|T[G  ]  -  TIG]  -  l  <Kx)d( Gn  -  G)(x)| 

lie - 2 - — - 2 -  -  0 

n^»  l|Gn  -  G||„ 

for  most  <K*)  (all  considered  here),  a  Frechet-type  differentia 

bility  result  for  estimators  derived  from  Z2  .  is  given  which 

a,b 

could  be  used  for  a  new  definition  of  robustness,  somewhat 
parallel  to  but  more  stringent  than  that  of  Beran  (1977a) ,  to 
be  considered ^.n  a  future  paper. 

4)  Identical  results  hold  when  scale  is  also  unknown  and  G  e  T 
with  the  case  of  unknown  scale  and  G  {  T  not  yet  fully  resolved 
by  the  authors. 

5)  Parr  and  DeWet  (1979)  show  asymptotic  normality  of  TlG^] 
in  the  correct  model  case  with  weighted  Cramer-von  Mises 

discrepancies  for  general  parameters.  The  proof  would  easily 

2 

extend  to  a  weighted  version  of  Z 

a  f  o 

6)  The  symmetry  of  F  and  G  was  specified  only  to  simplify  the 
notation.  For  G  c  T  this  restriction  may  be  omitted.  For  G  {  T 
and  in  the  absence  of  symmetry  it  typically  will  suffice  for 
6(G,Fg)  to  have  a  unique  minimum  6o  at  which  it  is  suitably 


differentiable. 


3.2  Scale  Unknown  Cases. 


Typically,  all  the  remarks  in  Section  3.1  hold  for  the  scale 

unknown,  location  estimation  problem  when  G  and  F  are  both 

symmetric.  Then,  the  parameter  6 jis  two-dimensional  and  location 

and  scale  are  estimated  simultaneously.  Here,  the  scale  estimator 

S(G  )  is  consistent  and  asymptotically  independent  of  the  location 
n 

estimator  (see  Huber  (1972)  for  a  related  remark  regarding 

M-estimators)  and  thus  the  asymptotic  properties  of  the  location 

estimator  are  the  same  as  if  scale  were  known,  i.e.  S(Gr)  ■  S(G) 

for  all  G  . 

n 


A.  MONTE  CARLO  RESULTS  FOR  LOCATION  ESTD1ATI0N 

A  Monte  Carlo  investigation  of  the  performance  character¬ 
istics  of  MD-estimators  over  a  wide  variety  of  symmetric  distri¬ 
butions  in  the  location  estimation  problem  (scale  unknown)  is 
reported  in  this  section.  This  case  is  the  best  studied  and 
understood,  permitting  direct  comparisons  with  Monte  Carlo  studies 
of  other  proposals.  All  computations  were  performed  on  the  CDC 
Cyber  72  at  Southern  Methodist  University. 

It  was  felt  that  such  a  study  was  in  order  for  several 
reasons  -  i)  to  relate  the  large-sample  theory  for  MD-estimators 
to  the  practical  small-sample  situations  likely  to  be  encountered, 
ii)  to  explore  the  behavior  of  the  MD-estimators  based  on 


sup-type  discrepancies,  for  which  the  large-sample  theory  is 
incomplete,  and  iii)  to  bolster  the  authors'  argument  that 
MD-es timators  are  easily  applicable  and  may  well  be  good  for 
more  complex  parameter  estimation  problems. 

The  distributions  G,  for  which  results  are  reported  (with 
abbreviated  notation  in  parentheses)  include  the  standard  normal 
(N(0,1)),  t-distributions  with  8,  4,  2,  and  1  degrees  of 
freedom  (T(8),  T(4),  T(2),  T(l>),  the  Laplace  distribution  (LAP), 
fixed  proportion  (3:1)  mixtures  of  standard  normals  and  slash 
(quotients  of  standard  normals  and  independent  uniforms)  (3N1S) , 
fixed  proportion  mixtures  (9:1)  of  standard  normals  and  normals 
with  standard  deviations  of  3  and  10  (10Z  3N  and  10Z  ION 
respectively)  and  a  fixed  (equal)  proportion  mixture  of  standard 
normals  and  uniform  variates  with  mean  0  and  variance  1 
(50%U*).  All  distributions  except  the  last  have  tailweight 
greater  than  or  equal  to  that  of  the  normal.  Generation  of  the 
normal  variates  was  done  via  the  polar  method,  with  all  required 
uniforms  generated  by  a  multiplicative  congruential  method. 
Chi-square  variates  were  generated  via  the  IMSL  routine  GGCSS. 
Primary  attention  is  focused  upon  sample  size  n  •  20,  with  a 
subset  of  the  above  configurations  examined  for  n  ■  10.  The 
Princeton  Swindle  (Gross  (1973))  was  employed  to  reduce  Monte 
Carlo  variability  for  all  but  the  distribution  50XU*. 


This  highly  efficient  swindle  is  based  upon  variates  of  the 
form  X  “  Z/Y  where  Z  N(0,1)  and  Y  are  independent.  Unfortun¬ 
ately  the  kurtosis,  K(X),  of  such  variates  satisfies  K(X)  >.K(Z), 
regardless  of  the  distributions  of  Z  and  Y  (subject  to  the 
existence  of  the  relevant  moments).  Also,  the  swindle  does 
not  appear  to  extend  easily  to  numerators  other  than  the  normal. 
Thus,  the  ideas  of  this  method  seem  to  be  presently  unusable  for 
short-tailed  populations  in  general.  (See  Parr  (1979)  for  an 
extension  to  the  uniform  case.)  All  results  quoted  are  based 
upon  1000  repetitions. 

Table  4.1  is  a  glossary  for  the  estimators  for  which 
performance  measures  are  given  later.  MD-estimators  with  "fixed" 
scale  estimation  utilize  the  (properly  scaled  for  the  pro¬ 
jection  model)  sample  interquartile  range  as  a  scale  estimate 
and  minimize  the  discrepancy  over  choice  of  the  location 
parameter  via  the  IMSL  Fibonacci-type  minimization  routine 
ZXFIB.  The  estimate  is  taken  to  be  the  sample  median  when  the 
minimizing  value  falls  outside  of  the  first  and  third  quartiles. 
This  routine  does  not  require  specification  of  the  derivatives 
of  the  objective  function  with  respect  to  the  parameters,  and 
thus  is  not  the  most  efficient  choice  in  general.  The  authors 
chose  it,  however,  to  demonstrate  the  reasonable  practicality 
of  the  MD  -  method  -  not  requiring  special  routines  beyond  a 


function  to  compute  F.(*)»  one  to  compute  &(F.,G  ),  and  an 

O  U  n 

omnibus  minimization  routine.  In  spite  of  this,  the  routine 

converged  to  within  an  accuracy  of  .005  for  T[G  ]  rather 

n 

quickly  for  CVM-N,  converging  in  an  average  of  .115  seconds 
(typically  12-14  iterations)  for  the  N(0,1)  parent,  .120 
seconds  (12-14  iterations)  for  a  Cauchy  parent,  and  .115 
seconds  (12-14  iterations)  for  a  Laplace  parent.  These  compared 
to  typical  times  for  the  M-estimators  of  .005-. 006  for  all  three 
parents.  The  average  cost  of  any  single  estimator  studied 
was  less  than  one  cent  at  the  current  rates  for  the  SMU  Cyber  72. 
Subsequent  experimentation  with  rational  function  approx¬ 
imations  to  the  normal  cumulative  distribution  function 
reduced  the  average  times  for  the  MD-estimators 
by  a  factor  of  two  from  those  times  quoted  above. 

MD-estimators  with  "simultaneous"  scale  estimaton  initialize 
the  location  and  scale  parameters  at  the  median  and  rescaled 
interquartile  range,  minimizing  the  discrepancy  jointly  in  the 
two  parameters  via  the  IMSL  quasi-Newton  ZXMIN  algorithm.  This 

routine  approximates  the  derivatives  of  6(G  ,F.)  with  respect 

n  o 

to  6  numerically.  No  initial  estimate  of  the  Hessian  matrix  is 
required.  As  before  ZXMIN  is  a  good  omnibus  minimization  routine 
chosen  to  demonstrate  the  ease  of  implementation  of  MD-estimators. 
For  this  routine,  CVM-N  converged  in  average  times  roughly 
twice  those  for  fixed  scale  estimation. 


For  the  Cramer-von  Hises  type  discrepancies  used  herein, 
(including  the  trimmed  ones)  verification  that  Theorem  3.1  holds 
is  a  matter  of  showing  that  the  1)  model  density  obeys  (ii)  of 
that  theorem  (trivial  for  the  models  considered  in  this  Monte 
Carlo  study),  2)  o^2  >  0,  and  3)  the  consistency  condition.  The 
other  MD-estimators,  based  upon  the  Kolmogorov  and  Ruiper  discre¬ 
pancies,  do  not  have  asymptotically  normal  distributions,  as 

mentioned  in  Section  3.1. 

(Table  4.1  about  here) 

The  outer  mean  OM  (the  average  of  the  25%  largest  and  25% 
smallest  values  in  the  sample)  was  included  to  demonstrate  the 
drastic  inefficiency  of  existing  proposed  robust  estimators 
for  short-tailed  situations.  All  other  estimators  have 
mnemonics  as  in  Andrews,  et.al.  (1972)  (H10,  H15,  H20,  12A,  17A, 
21A,  22A,  25A,  HGP,  GAS,  50%,  M)  and  are  computed  as  in  routines 
contained  therein.  The  Hampels  and  Hubers  were  included  as 
families  including  some  of  the  best  and  most  studied  estimators 
in  the  literature. 

Entries  in  Tables  4.2a  and  b  are  20  times  (estimated  var¬ 
iance  of  estimator).  An  approximate  standard  error  for  each 

entry  in  a  given  column  can  be  obtained  as  S52  ■  .0447  (Entry-a) , 

T 

where  a  is  the  value  in  the  last  row  of  that  column  and  is 
related  to  the  savings  due  to  the  swindle.  More  digits  than 
are  often  significant  are  included  since  the  blocking  effect  due 


TABLE  4.1 


to  using  the  same  samples  across  all  estimators  makes  qualitative 
comparisons  of  different  estimators  at  the  same  distribution 
more  precise.  It  should  be  mentioned  that  the  swindle,  which 
is  responsible  for  a  >  0  in  all  but  the  short-tailed  case, 
produces  more  precise  variance  estimates  for  more  efficient 
estimators  and  for  more  near-normal  distributions.  Table  4.3 
contains  similar  results  at  n  ■  10  for  a  smaller  set  of  distri¬ 
butions.  Comparisons  with  both  exact  theoretical  values  and 
previous  Monte  Carlo  work  bolster  faith  in  the  estimated  variances 
and  their  approximate  standard  errors. 

(Tables  4.2a,  b  and  4.3  about  here) 

Several  points  are  worthy  of  note  based  upon  a  general 
inspection  of  the  tables.  Distributions  not  examined  in  the 

Princeton  Robustness  Study  (PRS)  are  50%U*,  T(8),  T(4),  and 
T (2) .  In  general,  the  MD-estimators  seem  to  fare  extremely  well 
for  all  but  the  most  drastic  heavy-tailed  alternatives  to 
normality,  in  comparison  with  even  the  best  of  the  M-estimators 
considered  here.  A  perusal  of  the  relative  behavior  of  MD- 
estimators  using  fixed  or  simultaneous  scale  estimation  reveals 
the  simultaneous  estimation  of  scale  to  be  profitable  when  the 
sampled  distribution  G  is  not  near  the  projection  family  T,  but 
in  fact  a  liability  otherwise. 


4.2a  MONTE  CARLO  VARIANCES  FOR  LOCATION  ESTIMATORS 

n  -  20 


Population 

Estimator''"-^ 

N (0,1 ) 

T(8) 

T(4) 

T(2) 

T(l) 

Fixed  Scale 

CVM-N 

1.0595 

1.2383 

1.4621 

2.1044 

4.3926 

TCVM-N- . 10 

1.0648 

1.2415 

1.4605 

2.0891 

4.3156 

TCVM-N- . 20 

1.0913 

1.2550 

1.4526 

2.0344 

4.0872 

CVM-T(4) 

1.0912 

1.2531 

1.4497 

1.9956 

3.8347 

KS-N 

1.0871 

1.3125 

1.5835 

2.4439 

5.6664 

V-N 

1.9486 

2.1242 

2.2025 

2.3516 

3.1590 

o2-n 

1.4052 

1.5306 

1.6297 

1.8919 

2.7428 

Simultaneous 

Scale 

S CVM-N 

1.0852 

1.2478 

1.4484 

1.9950 

3.6514 

STCVM-N- . 10 

1.1107 

1.2671 

1.4519 

1.9436 

3.3410 

S TCVM-N- . 20 

1.1722 

1.3122 

1.4719 

1.8943 

3.0350 

SKS-N 

1.1257 

1.2993 

1.4996 

2.1239 

4.5855 

SV-N 

1.9960 

2.1095 

2.0761 

2.1872 

2.8649 

su2-n 

1.8270 

1.8569 

1.8462 

1.9999 

2.5863 

M 

1.0000 

1.3126 

2.0596 

10.8307 

****** 

50% 

1.4571 

1.5897 

1.7297 

1.9861 

2.7777 

GAS 

1.2102 

1.3446 

1.4939 

1.8905 

3.1305 

OM 

1.1754 

1.8529 

3.9725 

35.7840 

****** 

HGP 

1.0290 

1.3180 

1.6561 

2.4014 

3.7346 

H10 

1.0979 

1.2571 

1.4498 

1.9902 

3.7026 

HI  5 

1.0363 

1.2390 

1.5120 

2.3343 

5.7788 

H20 

1.0135 

1.2520 

1.5985 

2.6966 

8.5473 

12A 

1.2006 

1.3275 

1.4829 

1.8908 

2.7843 

17a 

1.1073 

1.2764 

1.4681 

1.9584 

3.0951 

21A 

1.0672 

1.2599 

1.4935 

2.0814 

3.4441 

22A 

1.0905 

1.2899 

1.5231 

2.0949 

3.4714 

25A 

1.0389 

1.2499 

1.5215 

2.2007 

3.8362 

a 

1.0000 

1.0127 

1.0256 

1.0526 

1.1111 

4 . 2b  MONTE  CARLO  VARIANCES.  FOR  LOCATION  ESTIMATORS 

n  ■  20 


"'^Population 

Estimator 

LAP 

10%3N 

lOtlON 

3N1S 

50%U* 

Fixed  Scale 

CVM-N 

1.4112 

1.3091 

1.4571 

1.6174 

1.2215 

TCVM-N- . 10 

1.4023 

1.3108 

1.4522 

1.6107 

1.2275 

TCVM-N- . 20 

1.3703 

1.3177 

1.4355 

1.5914 

1.3061 

CVM-T(4) 

1.3509 

1.3244 

1.4446 

1.5934 

1.3113 

KS-N 

1.5337 

1.3832 

1.5997 

1.8484 

1.0972 

V-N 

1.8125 

2.1943 

2.0868 

2.2660 

2.5710 

u2-n 

1.4035 

1.5530 

1.4658 

1.6537 

1.8497 

Simultaneous 

Scale 

SCVM-N 

1.3599 

1.3190 

1.4400 

1.5867 

1.3020 

STCVM-N- . 10 

1.3285 

1.3384 

1.4482 

1.5893 

1.3688 

STCVM-N-. 20 

1.3039 

1.3883 

1.4860 

1.6111 

1.5216 

SKS-N 

1.3906 

1.3731 

1.5074 

1.7037 

2.8267 

SV-N 

1.6825 

2.1443 

1.9482 

2.1866 

1.3217 

SU2~N 

1.5274 

1.8497 

1.6795 

1.9054 

2.9790 

M 

2.0450 

1.7594 

10.2602 

****** 

1.0158 

50% 

1.3553 

1.6574 

1.7422 

1.9203 

1.9867 

GAS 

1.3405 

1.4204 

1.5107 

1.6482 

1.6096 

OM  ' 

3.8071 

3.2054 

33.9266 

*«***« 

.8674 

HGP 

1.6424 

1.4532 

1.7297 

1.8731 

1.0139 

H10 

1.3622 

1.3203 

1.4292 

1.5789 

1.3467 

HI  5 

1.5386 

1.2973 

1.4763 

1.6932 

1.1277 

H20 

1.6720 

1.3407 

1.6501 

1.9692 

1.0506 

12A 

1.3415 

1.3752 

1.3481 

1.5436 

1.5581 

17A 

1.3857 

1.3138 

1.2816 

1.4788 

1.3165 

21A 

1.4593 

1.2929 

1.2637 

1.4820 

1.1922 

22A 

1.5087 

1.3062 

1.2610 

1.4968 

1.2055 

25A 

1.5208 

1.2952 

1.2811 

1.5165 

1.1177 

a 

.5220 

1.0975 

1.1105 

1.2018 

0.0000 
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4.3  MONTE  CARLO  VARIANCES  FOR  LOCATION  ESTIMATORS 

n  -  10 


Population 


Es  timator^"*-N^ 

N(0,1) 

T(4) 

Laplace 

10%10N 

T{2) 

Fixed  Scale 

CVM-N 

1.0783 

1.4762 

1.4588 

1.4425 

2.1444 

1.4986 

TCVM-N-.10 

1.0838 

1.4780 

1.4547 

1.4423 

2.1325 

1.4982 

TCVM-N-. 20 

1.1089 

1.4771 

1.4293 

1.4283 

2.0894 

1.4944 

CVM-T(4) 

1.1109 

1.4750 

1.4087 

1.4338 

2.0576 

1.4944 

KS-N 

1.0796 

1.5770 

1.5386 

1.5263 

2.3515 

1.6062 

V-N 

1.9145 

2.1881 

1.7790 

2.0044 

2.4558 

2.1384 

U  -N 

1.4783 

1.7256 

1.5008 

1.4974 

2.0316 

1.6529 

Simultaneous 

Scale 

SCVM-N 

1.0871 

1.4678 

1.4377 

1.4332 

2.1079 

1.4877 

STCVM-N-.IO 

1.1208 

1.4770 

1.4115 

1.4431 

2.0715 

1.4984 

STCVM-N-. 20 

1.1998 

1.5206 

1.3791 

1.4835 

2.0102 

1.5560 

SKS-N 

1.1191 

1.5278 

1.4348 

1.4740 

2.1895 

1.5516 

SV-N 

1.9993 

2.1807 

1.7893 

1.9363 

2.3932 

2.1081 

su2-n 

1.8501 

2.0062 

1.6419 

1.6881 

2.1940 

1.8837 

M 

1.0000 

2.0491 

1.9407 

11.1790 

9.3433 

2283.0178 

50% 

1.4031 

1.7117 

1.4045 

1.6630 

2.0914 

1.7393 

GAS 

1.2288 

1.5315 

1.3755 

1.4964 

1.9823 

1.5621 

OM 

1.1081 

3.2184 

3.0410 

27.0583 

21.4474 

6335.8547 

HGP 

1.0411 

1.7605 

1.8899 

1.6689 

3.0374 

•  1.9998 

H10 

1.1014 

1.4736 

1.4446 

1.4174 

2.1101 

1.4817 

HI  5 

1.0339 

1.5271 

1.6122 

1.4731 

2.4550 

1.5762 

H20 

1.0120 

1.6230 

1.7264 

1.6600 

3.0630 

2.0^53 

12A 

1.2264 

1.5509 

1.4130 

1.3551 

1.9482 

1.4911 

17A 

1.1500 

1.5009 

1.4535 

1.2868 

2.0192 

1.4250 

21A 

1.1044 

1.5087 

1.4952 

1.2661 

2.1189 

1.4299 

22A 

1.1496 

1.5483 

1.5410 

1.2751 

2.1237 

1.4463 

25A 

1.0670 

1.5174 

1.5448 

1.2779 

2.2402 

1.4596 

a 

1.0000 

1.0526 

.6562 

1.1099 

1.1111 

1.1565 

Table  4.4  gives  a  smaller  set  of  values  more  amenable  to 
graphical  presentation.  The  entries  are,  for  n  ■  20, 

E^[ j ]  -  Var(Ti  at  distribution  j)/Var(best  T  at  distribution  j), 
i.e.,  estimated  (efficiency)  1  relative  to  the  (empirically 
determined)  best  estimator  for  that  distribution.  This  adjusts 
for  scale  differences  in  the  sampled  populations,  which  for 
example  avoids  the  difficult  matter  of  rescaling  T(2)  to  be  in 
some  sense  comparable  with  N(0,1),  thus  permitting  meaningful 
comparisons  across  distributions.  Furthermore,  based  upon  a 
first-order  approximation,  these  entries  should  have  smaller 
coefficients  of  variation  than  those  in  4.2a  and  b  since  the 
numerators  and  denominators  are  highly  correlated,  both  being 
estimates  of  the  variances  of  fairly  efficient  estimators  based 
upon  the  same  data. 

The  generally  good  behavior  of  MD-estimators  based  upon 
CVM-type  discrepancies  when  the  sampled  distribution  is  a  t  with 
moderate  degrees  of  freedom  stands  out  as  before.  Figure  4.1 
plots  EitT(4)]  versus  Ei[N(0,l)]  for  a  number  of  the  estimators 
considered.  With  this  plotting  system,  good  estimators  will  lie 
towards  the  bottom  left  of  the  plot.  MD-estimators  utilizing 
the  Kolmogorov  discrepancy  are  clearly  inferior  for  moderately 
longtailed  deviations  from  normality,  while  those  using  CVM-type 
discrepancies  perform  quite  well  both  here  and  under  normality. 
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Figure  4.2  gives  a  similar  plot  for  a  E^fT(2)]  versus  E^[N(0,1)] 
comparison.  Here,  M  and  MD-estim&tion  methods  seem  to  be  equally 
good.  For  heavytailed  symmetric  distributions  beyond  the  T(2) 

(Cauchy,  Slash,  or  mixtures  involving  high  proportions  of  Cauchy 
or  Slash)  the  Hampels  (particularly  12A)  emerge  as  by  far  the 
best  choice.  The  MD-es timators  using  V  and  U2  being  to  exhibit 
some  merit  in  these  situations,  in  contrast  to  their  disappointing 
behavior  at  the  normal  parent. 

(Table  4.4,  Figures  4.1,  4.2  about  here) 

In  summary,  the  MD-es timators  are  quite  competitive  (while 
still  not  finely  tuned)  for  all  but  the  most  drastic  alternatives 
to  normality.  Furthermore,  additional  study  may  well  reveal  (as 
suggested  by  the  behavior  of  CVM-T(4))  that  the  use  of  moderately 
or  perhaps  drastically  heavytailed  projection  families  T  produces 
tm-estimators  which  work  well  for  this  case  also.  The  behavior 
of  KS (fixed  scale)  for  50%U*  suggests  some  hope  for  the  shorttailed 
situation  as  well. 

5.  SUMMARY  AND  CONCLUSIONS 

Both  theoretical  and  Monte  Carlo  results  have  been  given  to 
suggest  that  MD-es timation  is  competitive  with  the  better  of  the 
extant  methods  for  the  simple,  symmetric-location  estimation 


4.4  EMPIRICAL  DEFICIENCIES  OF  SELECTED  ESTIMATORS 
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FIGURE  4.1  Plot  of  EitT(4)]  Versus  E^NtO.l)] 
for  Selected  Estimators 


FIGURE  4.2  Plot  of  E^Ttt)]  Versus  Ei[N(0,l)] 
for  Selected  Estimators 
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problem.  MD-estimators  seem  to  be,  however,  far  easier  to  apply 
for  more  general  estimation  problems  (strongly  consistent  and 
robust  estimators  are  easy  to  derive  and  compute)  than  M-estimators , 
which  can  become  quite  intractable  in  situations  in  which  these 
symmetry  and  invariance  properties  do  not  hold. 
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APPENDIX  A 

Herein  we  prove  Theorem  3.1  for  the  case  G  ■  F  e  r  •  The 
other  case  G  t  r  nay  be  treated  by  a  similar  argument,  but  is 
not  proven  here  due  to  space  considerations. 

Proof  of  Theorem  3.1 
Define 

0' 

\ 

~w 


yo 


X,Fe) 


-»  <  C  <  " 


|e«c  ' 

for  all  distribution  functions  K. 

T [G  ]  is  thus  a  root  of  A„  (T(G  ))  ■  0,  where  we  assume  a  method 
n  on 

n 

of  selecting  a  consistent  root  (for  instance  the  one  closest  to 
the  median).  Define  also  the  function  (continuous  at  t  ■  T(F] ) 


h(t) 


yt) 

t-T[Fj 


,  t  ft  T[F) 


Ap(T(F)) ,t  -  TIF). 


Simple  calculation  for  6  «  Z2  .  yields 

A^,(T(F))  -  2a  /f3du  +  2b(/f2dp)2. 

Hence 

A  (T(G  )) 

TlGnJ  '  TIP]  "  hW57  ' 

and  h(T(Gn))  Aj(T(F))  with  probability  1. 

He  desire  to  show  that  the  following  is  legitimate  as  an  expression 
for  the  differential. 


Di«Vp>  ■  K,r<c,4«v,> 

2  ♦  b(/fS4«)<J(Kn)MU)J 


A^.(T(F)) 
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A£<T<F)) 

Define  also  H<Gn>  ■  h(T(G  *  which  converges  to  1  with  probability  1. 

n 

We  nay  thus  write 

| T [G  ]  -  TIP]  -  H(G)D(G-F) | 
n  n  r  n 

-  I  j~5-^||Xp(T(Gn))-2(a/(F-G)f2dWfb(/f2du)(/(G-F)f«Ju)  | 
n 

and  must  now  demonstrate  that  this  tends  to  zero  faster  than 
| | Gn~F | I,,.  The  right  hand  side  reduces  after  considerable  mathe¬ 
matical  manipulation  to 

|  IU*Jo-V  ,-f2>a.«  ITCV> 

n  n  n 

+  */'<VrTIG  I>J-<MT|G  ]>2,filG  1* 
n  n  n 

+  2bt(/f2dH)(/(FT[G  j-Gn)+(F-FT[G  j)(fTlG  ,-£><* 
n  n  n 

+  2bt(/(Gn"FT[G  I,fTlG  ]Ai,/<Gn_FT[G  ])fT[G  ]** 
n  n  n  n 

-  '/(F-FT[G  I>fTtG  ]a">/‘F-FT[G  ))fT[=  I 

n  n  n  n 

i  ImiTO)  H°<ltG„  -  F||J  -o(||Gn-FH„) 

n 

by  the  triangle  inequality  and  finiteness  of  J|f’|dy.  Thus, 

|T[Gn]  -  T [ F]  -  H(Gn)DT(Gn  -  F) |  -  o(| |Gn  -  F| jj  and  the  theorem 
follows  immediately  by  the  Lindeberg-Levy  version  of  the  central 
limit  theorem. 

The  case  G  {  T  follows  similarly,  but  with  many  more  terms 
to  be  bounded  in  comparable  fashion. 
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